Workshop FF UK 5.10.2023 2️⃣
Vybrané kapitoly z analýzy dat
LMU Munich
📫 renata.topinkova[at]lmu.de
API = Application Programming Interfaces
Source: https://www.geeksforgeeks.org/what-is-an-api/
Isn’t there an R package for that?
📦 WHO, guardianapi, spotifyR, nytimes, wbstats, RedditExtractoR
Are you sure?
Google, Github
If you’re SURE sure… Generic package
📦 httr, httr2
📖 STEP 1 : Read the documentation
Endpoint = designated point for data collection (often > 1)
Parameters = How can I narrow down what I want to get? What can I get? What values does the API accept?
Authentication = Do I need API token? How do I get it? Where do I put it?
Rate limits = How much can I download in a minute/day?
ToS = What are you allowed to do with the data? Can you publish it? In what form?
Starts with…
If unsure, check doggos
1. Specify the endpoint for your query
2. Specify the query itself
req_url_query() or req_url_path()3. Authenticate if the API requires it
No one shoe fits all - depends on the API, read through documentation & hope for the best
Different functions available:
req_auth_bearer_token()req_oauth_* functions for oAuthreq_headers()4. OPTIONAL: Test it out, see what your are planning to send
5. Send the request - req_perform()
req_perform() is called, nothing gets sent to the API!6. Parse the response
resp_body_* (json, xml, html)Leaving the httr2…
7. Wrangle the data - as_tibble, map_*, bind_rows, and others..
8. Unnest if needed - unnest, unnest_wider, unnest_longer
9. Analysis!
req_url_path() adds /
req_url_query() adds ? after endpoint, key-value pairs are separated by &
GET / HTTP/1.1
Host: api.nationalize.io?name=Renata
User-Agent: httr2/0.2.3 r-curl/5.0.1 libcurl/7.84.0
Accept: */*
Accept-Encoding: deflate, gzip
If we wanted to change/add some heading, we could do it with req_headers()
Make sure you assign it to a new object!
Explore what you got
resp_body_* (json, xml, html)$count
[1] 172839
$name
[1] "Renata"
$country
$country[[1]]
$country[[1]]$country_id
[1] "CZ"
$country[[1]]$probability
[1] 0.168
$country[[2]]
$country[[2]]$country_id
[1] "BR"
$country[[2]]$probability
[1] 0.144
$country[[3]]
$country[[3]]$country_id
[1] "PL"
$country[[3]]$probability
[1] 0.132
$country[[4]]
$country[[4]]$country_id
[1] "LT"
$country[[4]]$probability
[1] 0.084
$country[[5]]
$country[[5]]$country_id
[1] "SK"
$country[[5]]$probability
[1] 0.076
Often useful to examine the structure - can help us figure out why wrangling is failing
List of 3
$ count : int 172839
$ name : chr "Renata"
$ country:List of 5
..$ :List of 2
.. ..$ country_id : chr "CZ"
.. ..$ probability: num 0.168
..$ :List of 2
.. ..$ country_id : chr "BR"
.. ..$ probability: num 0.144
..$ :List of 2
.. ..$ country_id : chr "PL"
.. ..$ probability: num 0.132
..$ :List of 2
.. ..$ country_id : chr "LT"
.. ..$ probability: num 0.084
..$ :List of 2
.. ..$ country_id : chr "SK"
.. ..$ probability: num 0.076
unnest, unnest_wider, unnest_longer
There are many other useful functions in httr2, look up the documentation
All functions requesting something start with req_*, all functions working with the response start with resp_*
req_throttle()req_retry()resp_is_error… etc.
an API for predicting nationality from a name
a simple API to predict the gender of a person given their name
May seem silly but…
e.g., Holman et al. (2018) - estimating gender gap in science
Open the 02_1_API_wo_package_exercise.qmd file.
Note
Make sure to make a project where your work will reside.
25:00
The OMDb API is a RESTful web service to obtain movie information, all content and images on the site are contributed and maintained by our users.
Open the `02_2_API_wo_package_exercise.qmd file.
Note
Make sure you place your api key inside you project.
25:00
Webscraping v R 2023 - Renata Topinkova